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I Claim: 

1 . A video processing apparatus comprising: 

a. a memory; and 

b. one or more video processing modules, each video processing module coupled to 
the memory and comprising: 

i. a programmable array of processing elements, each processing element 
including local registers to provide data used in processing operations and 
to store results of the processing operations; 

ii. a block load and store unit coupled to the programmable array of 
processing elements to load, store, and send data transferred back and forth 
between the memory and the array of processing elements; 

iii. a global accumulation vmit to accumulate the results of the processing 
operations for each processing element; and 

iv. a local controller to provide instructions and parameters related to the 
processing operations and data transfer. 

2. The apparatus of claim 1 wherein the array of processing elements comprises a two- 
dimensional array. 

3. The apparatus of claim 2 wherein the two-dimensional array comprises a 4x4 array of 
processing elements. 

4. The apparatus of claim 2 wherein the two-dimensional array comprises a single- 
instruction multiple-data array. 

5. The apparatus of claim 1 wherein each processing element includes a plurality of vector 
registers and a plurality of block registers. 
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6. The apparatus of claim 5 wherein each vector register and each block register is 
configured to hold 8 8-bit data elements as a two-dimensional 2x4 block of pixels or 4 
16-bit data elements as a one-dimensional vector. 

7. The apparatus of claim 1 wherein the block load and store unit comprises one or more 
arrays of exchange registers. 

8. The apparatus of claim 7 wherein each array of exchange registers is a two-dimensional 
array. 

9. The apparatus of claim 1 wherein the local controller provides control commands to each 
processing element, performing control and processing operations on data stored within 
the local controller, and transfers data between the local controller and other registers 
within one video module. 

10. The apparatus of claim 1 further comprising a system controller coupled to the memory 
and to the one or more video processing modules. 

11. The apparatus of claim 1 further comprising a direct, high-bandwidth data path to couple 
each of the video processing modules to the memory. 

12. The apparatus of claim 1 wherein each processing element further comprises a plurality 
of scalar registers. 

13. The apparatus of claim 1 wherein the block load and store unit sends data transferred 
back and forth between non-adjacent processing elements of the array of processing 
elements. 

14. The apparatus of claim 1 wherein each processing element includes a local accumulation 
register. 
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15. The apparatus of claim 1 wherein each processing element further comprises a plurality 
of control registers including a PE mask register, a condition register, a block base 
register, and a vector base register. 

16. The apparatus of claim 1 wherein the block load and store unit sends data transferred 
back and forth between the local registers in the processing elements, the global 
accumulation unit, and the local controller. 

17. A method of processing video comprising: 

a. configuring a video stream into data blocks; 

b. loading data blocks from memory to a first array of exchange registers; 

c. loading data blocks from the first array of exchange registers to a programmable 
array of processing elements, wherein each processing element within the array of 
processing elements includes an array of block registers, an array of vector 
registers, and a local accumulator, the data blocks are loaded from the first array 
of exchange registers to the array of block registers; 

d. loading the data blocks from the array of block registers to the array of vector 
registers; 

e. processing the data blocks loaded in the array of vector registers and storing 
results in the corresponding local accumulator for each processing element; 

f. accumulating the results stored in the local accumulators in a global accumulator, 
thereby forming accumulated results; and 

g. moving the accumulated results into a local controller. 

1 8. The method of claim 1 7 fiirther comprising storing results from processing the data 
blocks in the array of vector registers, and loading the results stored in the array of vector 
registers in the array of block registers. 

19. The method of claim 18 fiirther comprising loading the results in the array of block 
registers into a second array of exchange registers, and loading the results from the array 
of block registers into memory. 



-21 - 



PATENT 
SONY-27300 

20. The method of claim 19 wherein each of the first and second array of exchange registers 
is a two-dimensional array. 

2 1 . The method of claim 1 8 further comprising loading the results in the array of block 
registers into a second array of exchange registers, and loading the results in the second 
array of exchange registers into another array of block registers included within non- 
adjacent processing elements to the processing elements including the array of block 
registers. 

22. The method of claim 18 further comprising loading the results in the array of block 
registers into another array of block registers included within a processing element 
adjacent to the processing element including the array of block registers. 

23. The method of claim 17 wherein the array of processing elements comprises a two- 
dimensional array. 

24. The method of claim 23 wherein the two-dimensional array comprises a 4x4 array of 
processing elements. 

25. The method of claim 23 wherein the two-dimensional array comprises a single-instruction 
multiple-data array. 

26. The method of claim 1 7 wherein each vector register and each block register is 
configured to hold 8 8-bit data elements as a two-dimensional 2x4 block of pixels or 4 
16-bit data elements as a one-dimensional vector. 

27. The method of claim 17 wherein each processing element further comprises a plurality of 
scalar registers such that processing the data blocks includes processing data blocks 
loaded from the array of block registers and data loaded from the array of scalar registers. 

28. The method of claim 1 7 wherein the local controller utilizes the accumulated results to 
make control decisions related to video processing. 
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A video processing apparatus comprising: 

a. means for configuring a video stream into data blocks; 

b. means for loading data blocks fi"om memory to a first array of exchange registers, 
the means for loading data blocks fi'om memory coupled to the means for 
configuring; 

c. means for loading data blocks fi"om the fu-st array of exchange registers to a 
programmable array of processing elements, the means for loading data blocks 
from the first array of exchange registers coupled to the means for loading data 
blocks from memory, wherein each processing element within the array of 
processing elements includes an array of block registers and an array of vector 
registers, the data blocks are loaded from the first array of exchange registers to 
the array of block registers; 

d. means for loading the data blocks from the array of block registers to the array of 
vector registers, the means for loading the data blocks from the array of block 
registers coupled to the means for loading data blocks from the first array of 
exchange registers; 

e. means for processing the data blocks loaded in the array of vector registers and 
storing results in the corresponding local accumulator for each processing 
element, the means for processing coupled to the means for loading the data 
blocks from the array of block registers; 

f. means for accumulating the results stored in the local accumulators in a global 
accumulator, thereby forming accumulated results, the means for accumulating 
coupled to the means for processing; and 

g. means for moving the accumulated results into a local controller, the means for 
moving coupled to the means for accumulating. 

The apparatus of claim 29 fiirther comprising means for storing results from processing 
the data blocks in the array of vector registers, and means for loading the results stored in 
the array of vector registers in the array of block registers. 

The apparatus of claim 30 fiirther comprising means for loading the results in the array of 
block registers into a second array of exchange registers, and means for loading the 
results from the array of block registers into memory. 
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32. The apparatus of claim 3 1 wherein each of the first and second array of exchange 
registers is a two-dimensional array. 

33. The apparatus of claim 30 further comprising means for loading the results in the array of 
block registers into a second array of exchange registers, and means for loading the 
results in the second array of exchange registers into another array of block registers 
included within non-adjacent processing elements to the processing elements including 
the array of block registers. 

34. The apparatus of claim 30 further comprising means for loading the results in the array of 
block registers into another array of block registers included within a processing element 
adjacent to the processing element including the array of block registers. 

35. The apparatus of claim 29 wherein the array of processing elements comprises a two- 
dimensional array. 

36. The apparatus of claim 35 wherein the two-dimensional array comprises a 4x4 array of 
processing elements. 

37. The apparatus of claim 35 wherein the two-dimensional array comprises a single- 
instruction multiple-data array. 

38. The apparatus of claim 29 wherein each vector register and each block register is 
configured to hold 8 8-bit data elements as a two-dimensional 2x4 block of pixels or 4 
16-bit data elements as a one-dimensional vector. 

39. The apparatus of claim 29 wherein each processing element further comprises a plurality 
of scalar registers such that processing the data blocks includes processing data blocks 
loaded firom the array of block registers and data loaded from the array of scalar registers. 

40. The apparatus of claim 29 wherein the local controller utilizes the accumulated results to 
make control decisions related to video processing. 
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41 . A programmable array of processing elements to process video, each processing element 
including local registers to store video data blocks received from a main memory, to 
process the received video data blocks, and to store results of processing the video data 
blocks. 

42. The programmable array of processing elements of claim 41 coupled to a local controller 
to provide instructions and parameters related to data transfer and processing of the video 
data blocks received from the main memory. 

43. The programmable array of processing elements of claim 42 wherein the local controller 
provides control commands to each processing element, performing control and 
processing operations on data stored within the local controller, and transfers data 
between the local controller and other registers within one video module. 

44. The programmable array of processing elements of claim 41 wherein the array of 
processing elements comprises a two-dimensional array. 

45. The progranmiable array of processing elements of claim 44 wherein the two-dimensional 
array comprises a 4x4 array of processing elements. 

46. The programmable array of processing elements of claim 44 wherein the two-dimensional 
array comprises a single-instruction multiple-data array. 

47. The programmable array of processing elements of claim 41 wherein each processing 
element includes a plurality of vector registers and a plurality of block registers. 

48. The programmable array of processing elements of claim 47 wherein each vector register 
and each block register is configured to hold 8 8-bit data elements as a two-dimensional 
2x4 block of pixels or 4 16-bit data elements as a one-dimensional vector 

49. The programmable array of processing elements of claim 41 wherein each processing 
element further comprises a plurality of scalar registers. 
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The programmable array of processing elements of claim 41 wherein each processing 
element includes a local accumulation register. 

The programmable array of processing elements of claim 41 wherein each processing 
element further comprises a plurality of control registers including a PE mask register, a 
condition register, a block base register, and a vector base register. 
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